95 research outputs found
Recommended from our members
Privacy-aware publication and utilization of healthcare data
textOpen access to health data can bring enormous social and economical benefits. However, such access can also lead to privacy breaches, which may result in discrimination in insurance and employment markets. Privacy is a subjective and contextual concept, thus it should be interpreted from both systemic and information perspectives to clearly understand potential breaches and consequences. This dissertation investigates three popular use cases of healthcare data: specifically, 1) synthetic data publication, 2) aggregate data utilization, and 3) privacy-aware API implementation. For each case, we develop statistical models that improve the privacy-utility Pareto frontier by leveraging a variety of machine learning techniques such as information theoretic privacy measures, Bayesian graphical models, non-parametric modeling, and low-rank factorization techniques. It shows that much utility can be extracted from health records while maintaining strong privacy guarantees and protection of sensitive health information.Electrical and Computer Engineerin
Multipar-T: Multiparty-Transformer for Capturing Contingent Behaviors in Group Conversations
As we move closer to real-world AI systems, AI agents must be able to deal
with multiparty (group) conversations. Recognizing and interpreting multiparty
behaviors is challenging, as the system must recognize individual behavioral
cues, deal with the complexity of multiple streams of data from multiple
people, and recognize the subtle contingent social exchanges that take place
amongst group members. To tackle this challenge, we propose the
Multiparty-Transformer (Multipar-T), a transformer model for multiparty
behavior modeling. The core component of our proposed approach is the
Crossperson Attention, which is specifically designed to detect contingent
behavior between pairs of people. We verify the effectiveness of Multipar-T on
a publicly available video-based group engagement detection benchmark, where it
outperforms state-of-the-art approaches in average F-1 scores by 5.2% and
individual class F-1 scores by up to 10.0%. Through qualitative analysis, we
show that our Crossperson Attention module is able to discover contingent
behavior.Comment: 7 pages, 4 figures, IJCA
Fine-Grained Socioeconomic Prediction from Satellite Images with Distributional Adjustment
While measuring socioeconomic indicators is critical for local governments to
make informed policy decisions, such measurements are often unavailable at
fine-grained levels like municipality. This study employs deep learning-based
predictions from satellite images to close the gap. We propose a method that
assigns a socioeconomic score to each satellite image by capturing the
distributional behavior observed in larger areas based on the ground truth. We
train an ordinal regression scoring model and adjust the scores to follow the
common power law within and across regions. Evaluation based on official
statistics in South Korea shows that our method outperforms previous models in
predicting population and employment size at both the municipality and grid
levels. Our method also demonstrates robust performance in districts with
uneven development, suggesting its potential use in developing countries where
reliable, fine-grained data is scarce
Robust estimation of bacterial cell count from optical density
Optical density (OD) is widely used to estimate the density of cells in liquid culture, but cannot be compared between instruments without a standardized calibration protocol and is challenging to relate to actual cell count. We address this with an interlaboratory study comparing three simple, low-cost, and highly accessible OD calibration protocols across 244 laboratories, applied to eight strains of constitutive GFP-expressing E. coli. Based on our results, we recommend calibrating OD to estimated cell count using serial dilution of silica microspheres, which produces highly precise calibration (95.5% of residuals <1.2-fold), is easily assessed for quality control, also assesses instrument effective linear range, and can be combined with fluorescence calibration to obtain units of Molecules of Equivalent Fluorescein (MEFL) per cell, allowing direct comparison and data fusion with flow cytometry measurements: in our study, fluorescence per cell measurements showed only a 1.07-fold mean difference between plate reader and flow cytometry data
Recommended from our members
Probabilistic cross-level imputation framework using individual auxiliary information
textIn healthcare-related studies, individual patient or hospital data are not often publicly available due to privacy restrictions, legal issues or reporting norms. However, such measures may be provided at a higher or more aggregated level, such as state-level, county-level summaries or averages over health zones such as Hospital Referral Regions (HRR) or Hospital Service Areas (HSA). Such levels constitute partitions over the underlying individual level data, which may not match the groupings that would have been obtained if one clustered the data based on individual-level attributes. Moreover, treating aggregated values as representatives for the individuals can result in the ecological fallacy. How can one run data mining procedures on such data where different variables are available at different levels of aggregation or granularity? In this thesis, we seek a better utilization of variably aggregated datasets, which are possibly assembled from different sources. We propose a novel "cross-level" imputation technique that models the generative process of such datasets using a Bayesian directed graphical model. The imputation is based on the underlying data distribution and is shown to be unbiased. This imputation can be further utilized in a subsequent predictive modeling, yielding improved accuracies. The experimental results using a simulated dataset and the Behavioral Risk Factor Surveillance System (BRFSS) dataset are provided to illustrate the generality and capabilities of the proposed framework.Electrical and Computer Engineerin
- …